Search CORE

702 research outputs found

Personalized Treatment-Response Trajectories: Errors-in-variables, Interpretability, and Causality

Author: Zhang Guangyi
Publication venue
Publication date: 17/06/2019
Field of study

One fundamental problem in many applications is to estimate treatment-response trajectories given multidimensional treatment variables. However, in reality, the estimation suffers severely from measurement error both in treatment timing and covariates, for example when the treatment data are self-reported by users. We introduce a novel data-driven method to tackle this challenging problem, which models personalized treatment-response trajectories as a sum of a parametric response function, based on restored true treatment timing and covariates and sharing information across individuals under a hierarchical structure, and a counterfactual trend fitted by a sparse Gaussian Process. In a real-life dataset where the impact of diet on continuous blood glucose is estimated, our model achieves a superior performance in estimation accuracy and prediction

Aaltodoc Publication Archive

Sparse Convolution for Approximate Sparse Instance

Author: Li Xiaoxiao
Song Zhao
Zhang Guangyi
Publication venue
Publication date: 04/06/2023
Field of study

Computing the convolution

A \star B

of two vectors of dimension

n

is one of the most important computational primitives in many fields. For the non-negative convolution scenario, the classical solution is to leverage the Fast Fourier Transform whose time complexity is

O(n \log n)

. However, the vectors

A

and

B

could be very sparse and we can exploit such property to accelerate the computation to obtain the result. In this paper, we show that when

\|A \star B\|_{\geq c_1} = k

and

\|A \star B\|_{\leq c_2} = n-k

holds, we can approximately recover the all index in

\mathrm{supp}_{\geq c_1}(A \star B)

with point-wise error of

o(1)

O(k \log (n) \log(k)\log(k/\delta))

time. We further show that we can iteratively correct the error and recover all index in

\mathrm{supp}_{\geq c_1}(A \star B)

correctly in

O(k \log(n) \log^2(k) (\log(1/\delta) + \log\log(k)))

time

arXiv.org e-Print Archive

Finding Favourite Tuples on Data Streams with Provably Few Comparisons

Author: Gionis Aristides
Tatti Nikolaj
Zhang Guangyi
Publication venue
Publication date: 06/07/2023
Field of study

One of the most fundamental tasks in data science is to assist a user with unknown preferences in finding high-utility tuples within a large database. To accurately elicit the unknown user preferences, a widely-adopted way is by asking the user to compare pairs of tuples. In this paper, we study the problem of identifying one or more high-utility tuples by adaptively receiving user input on a minimum number of pairwise comparisons. We devise a single-pass streaming algorithm, which processes each tuple in the stream at most once, while ensuring that the memory size and the number of requested comparisons are in the worst case logarithmic in

n

, where

n

is the number of all tuples. An important variant of the problem, which can help to reduce human error in comparisons, is to allow users to declare ties when confronted with pairs of tuples of nearly equal utility. We show that the theoretical guarantees of our method can be maintained for this important problem variant. In addition, we show how to enhance existing pruning techniques in the literature by leveraging powerful tools from mathematical programming. Finally, we systematically evaluate all proposed algorithms over both synthetic and real-life datasets, examine their scalability, and demonstrate their superior performance over existing methods.Comment: To appear in KDD 202

arXiv.org e-Print Archive

Ranking with submodular functions on a budget

Author: Gionis Aristides
Tatti Nikolaj
Zhang Guangyi
Publication venue
Publication date: 01/01/2022
Field of study

Submodular maximization has been the backbone of many important machine-learning problems, and has applications to viral marketing, diversification, sensor placement, and more. However, the study of maximizing submodular functions has mainly been restricted in the context of selecting a set of items. On the other hand, many real-world applications require a solution that is a ranking over a set of items. The problem of ranking in the context of submodular function maximization has been considered before, but to a much lesser extent than item-selection formulations. In this paper, we explore a novel formulation for ranking items with submodular valuations and budget constraints. We refer to this problem as max-submodular ranking (MSR). In more detail, given a set of items and a set of non-decreasing submodular functions, where each function is associated with a budget, we aim to find a ranking of the set of items that maximizes the sum of values achieved by all functions under the budget constraints. For the MSR problem with cardinality- and knapsack-type budget constraints we propose practical algorithms with approximation guarantees. In addition, we perform an empirical evaluation, which demonstrates the superior performance of the proposed algorithms against strong baselines.Peer reviewe

arXiv.org e-Print Archive

PubMed Central

Helsingin yliopiston digitaalinen arkisto